8 research outputs found

    BrAPI-an application programming interface for plant breeding applications

    Get PDF
    Motivation: Modern genomic breeding methods rely heavily on very large amounts of phenotyping and genotyping data, presenting new challenges in effective data management and integration. Recently, the size and complexity of datasets have increased significantly, with the result that data are often stored on multiple systems. As analyses of interest increasingly require aggregation of datasets from diverse sources, data exchange between disparate systems becomes a challenge. Results: To facilitate interoperability among breeding applications, we present the public plant Breeding Application Programming Interface (BrAPI). BrAPI is a standardized web service API specification. The development of BrAPI is a collaborative, community-based initiative involving a growing global community of over a hundred participants representing several dozen institutions and companies. Development of such a standard is recognized as critical to a number of important large breeding system initiatives as a foundational technology. The focus of the first version of the API is on providing services for connecting systems and retrieving basic breeding data including germplasm, study, observation, and marker data. A number of BrAPI-enabled applications, termed BrAPPs, have been written, that take advantage of the emerging support of BrAPI by many databases

    A template framework for environmental timeseries data acquisition

    No full text
    Environmental timeseries data variety is exploding in the Internet of Things era, making data reuse a very demanding task. Data acquisition and integration remains a laborious step of the environmental data lifecycle. Environmental data heterogeneity is a persistent issue, as data are becoming available through different protocols and stored under diverse, custom formats. In this work, we deal with syntactic heterogeneity in environmental timeseries data. Our approach is based on describing different dataset syntaxes using abstract representations, called templates. We designed and implemented EDAM (Environmental Data Acquisition Module), a template framework that facilitates timeseries data acquisition and integration. EDAM templates are written using programming language-agnostic semantics, and can be reused both for input and output, thus enabling data reuse via transformations across different formats. We demonstrate EDAM generality in seven case studies, which involve scraping online data, extracting observations from a relational database, or aggregating historical timeseries stored in local files. Case studies span different environmental sciences domains, including meteorology, agriculture, urban air quality and hydrology. We also demonstrate EDAM for data dissemination, as instructed by output templates. We identified several syntactic interoperability challenges though the case studies, that include managing with differences in formatting observables, temporal and spatial references, and metadata documentation, and addressed them with EDAM. EDAM implementation has been released under an open-source license

    A case-study for improved reusability of plant phenotyping data with MIAPPE

    No full text
    Accompanying datasets for manuscript, "A case-study for improved reusability of plant phenotyping data with MIAPPE". The ZIP archives in this repository contain the source files and the output files that this manuscript refers to

    Towards an air pollution health study data management system - A case study from a smoky Swiss railway

    No full text
    In air pollution health studies, measurements are conducted intensively but only periodically at numerous locations in a variety of environments (indoors, outdoors, personal). Often a variety of instruments are used to measure various pollutants ranging from gases (eg, CO, NO2, O3, VOCs, PAHs) to particulate matter (eg, particles smaller than 2.5um: PM2.5, PM10, ultrafine particles: UFP), and including other environmental parameters such as temperature, relative humidity, GPS position. As a result it is always a significant challenge for researchers to effectively QA/QC, combine, and archive these data so as to reliably assess people’s exposure to poor air quality. With the CEDAR system presented here we aim to provide a solution to this problem by employing a platform using templates for easily reading custom formatted files, apply rules for filtering and quality checking measurements, and ultimately publishing them as services on the web. The system is demonstrated for the case an air quality project conducted in a Swiss railway station where smoking is allowed

    Toward better data sharing methods for genebanks

    No full text
    The conservation of plant genetic resources (PGR) is an important task that requires collaborative effort from many stakeholders. For this, common means of data exchange and effective methods to profit from the collected information need to be established. In this paper, we describe a demonstrator promoting findability of PGR, according to the FAIR (Findable, Accessible, Interoperable & Reusable) data principles. PGR providers can each expose their germplasm information, using the FAO Multicrop Passport Descriptor (MCPD), which subsequently can be queried in a distributed manner via a single user interface. PGR users can select among predefined questions, for example for specific crops, accessions or phenotypes. On the back end, data integration from a distributed query is achieved through annotations with the MCPD semantics.</p

    Extracting knowledge networks from plant scientific literature : potato tuber flesh color as an exemplary trait

    No full text
    Background: Scientific literature carries a wealth of information crucial for research, but only a fraction of it is present as structured information in databases and therefore can be analyzed using traditional data analysis tools. Natural language processing (NLP) is often and successfully employed to support humans by distilling relevant information from large corpora of free text and structuring it in a way that lends itself to further computational analyses. For this pilot, we developed a pipeline that uses NLP on biological literature to produce knowledge networks. We focused on the flesh color of potato, a well-studied trait with known associations, and we investigated whether these knowledge networks can assist us in formulating new hypotheses on the underlying biological processes. Results: We trained an NLP model based on a manually annotated corpus of 34 full-text potato articles, to recognize relevant biological entities and relationships between them in text (genes, proteins, metabolites and traits). This model detected the number of biological entities with a precision of 97.65% and a recall of 88.91% on the training set. We conducted a time series analysis on 4023 PubMed abstract of plant genetics-based articles which focus on 4 major Solanaceous crops (tomato, potato, eggplant and capsicum), to determine that the networks contained both previously known and contemporaneously unknown leads to subsequently discovered biological phenomena relating to flesh color. A novel time-based analysis of these networks indicates a connection between our trait and a candidate gene (zeaxanthin epoxidase) already two years prior to explicit statements of that connection in the literature. Conclusions: Our time-based analysis indicates that network-assisted hypothesis generation shows promise for knowledge discovery, data integration and hypothesis generation in scientific research.</p

    Enabling reusability of plant phenomic datasets with MIAPPE 1.1

    Get PDF
    Enabling data reuse and knowledge discovery is increasingly critical in modern science, and requires an effort towards standardising data publication practices. This is particularly challenging in the plant phenotyping domain, due to its complexity and heterogeneity. We have produced the MIAPPE 1.1 release, which enhances the existing MIAPPE standard in coverage, to support perennial plants, in structure, through an explicit data model, and in clarity, through definitions and examples. We evaluated MIAPPE 1.1 by using it to express several heterogeneous phenotyping experiments in a range of different formats, to demonstrate its applicability and the interoperability between the various implementations. Furthermore, the extended coverage is demonstrated by the fact that one of the datasets could not have been described under MIAPPE 1.0. MIAPPE 1.1 marks a major step towards enabling plant phenotyping data reusability, thanks to its extended coverage, and especially the formalisation of its data model, which facilitates its implementation in different formats. Community feedback has been critical to this development, and will be a key part of ensuring adoption of the standard
    corecore